Search CORE

90 research outputs found

Efficient processing of large-scale spatio-temporal data

Author: Hagedorn Stefan
Publication venue
Publication date: 01/01/2020
Field of study

Millionen Geräte, wie z.B. Mobiltelefone, Autos und Umweltsensoren senden ihre Positionen zusammen mit einem Zeitstempel und weiteren Nutzdaten an einen Server zu verschiedenen Analysezwecken. Die Positionsinformationen und übertragenen Ereignisinformationen werden als Punkte oder Polygone dargestellt. Eine weitere Art räumlicher Daten sind Rasterdaten, die zum Beispiel von Kameras und Sensoren produziert werden. Diese großen räumlich-zeitlichen Datenmengen können nur auf skalierbaren Plattformen wie Hadoop und Apache Spark verarbeitet werden, die jedoch z.B. die Nachbarschaftsinformation nicht ausnutzen können - was die Ausführung bestimmter Anfragen praktisch unmöglich macht. Die wiederholten Ausführungen der Analyseprogramme während ihrer Entwicklung und durch verschiedene Nutzer resultieren in langen Ausführungszeiten und hohen Kosten für gemietete Ressourcen, die durch die Wiederverwendung von Zwischenergebnissen reduziert werden können. Diese Arbeit beschäftigt sich mit den beiden oben beschriebenen Herausforderungen. Wir präsentieren zunächst das STARK Framework für die Verarbeitung räumlich-zeitlicher Vektor- und Rasterdaten in Apache Spark. Wir identifizieren verschiedene Algorithmen für Operatoren und analysieren, wie diese von den Eigenschaften der zugrundeliegenden Plattform profitieren können. Weiterhin wird untersucht, wie Indexe in der verteilten und parallelen Umgebung realisiert werden können. Außerdem vergleichen wir Partitionierungsmethoden, die unterschiedlich gut mit ungleichmäßiger Datenverteilung und der Größe der Datenmenge umgehen können und präsentieren einen Ansatz um die auf Operatorebene zu verarbeitende Datenmenge frühzeitig zu reduzieren. Um die Ausführungszeit von Programmen zu verkürzen, stellen wir einen Ansatz zur transparenten Materialisierung von Zwischenergebnissen vor. Dieser Ansatz benutzt ein Entscheidungsmodell, welches auf den tatsächlichen Operatorkosten basiert. In der Evaluierung vergleichen wir die verschiedenen Implementierungs- sowie Konfigurationsmöglichkeiten in STARK und identifizieren Szenarien wann Partitionierung und Indexierung eingesetzt werden sollten. Außerdem vergleichen wir STARK mit verwandten Systemen. Im zweiten Teil der Evaluierung zeigen wir, dass die transparente Wiederverwendung der materialisierten Zwischenergebnisse die Ausführungszeit der Programme signifikant verringern kann.Millions of location-aware devices, such as mobile phones, cars, and environmental sensors constantly report their positions often in combination with a timestamp to a server for different kinds of analyses. While the location information of the devices and reported events is represented as points and polygons, raster data is another type of spatial data, which is for example produced by cameras and sensors. This Big spatio-temporal Data needs to be processed on scalable platforms, such as Hadoop and Apache Spark, which, however, are unaware of, e.g., spatial neighborhood, what makes them practically impossible to use for this kind of data. The repeated executions of the programs during development and by different users result in long execution times and potentially high costs in rented clusters, which can be reduced by reusing commonly computed intermediate results. Within this thesis, we tackle the two challenges described above. First, we present the STARK framework for processing spatio-temporal vector and raster data on the Apache Spark stack. For operators, we identify several possible algorithms and study how they can benefit from the underlying platform's properties. We further investigate how indexes can be realized in the distributed and parallel architecture of Big Data processing engines and compare methods for data partitioning, which perform differently well with respect to data skew and data set size. Furthermore, an approach to reduce the amount of data to process at operator level is presented. In order to reduce the execution times, we introduce an approach to transparently recycle intermediate results of dataflow programs, based on operator costs. To compute the costs, we instrument the programs with profiling code to gather the execution time and result size of the operators. In the evaluation, we first compare the various implementation and configuration possibilities in STARK and identify scenarios when and how partitioning and indexing should be applied. We further compare STARK to related systems and show that we can achieve significantly better execution times, not only when exploiting existing partitioning information. In the second part of the evaluation, we show that with the transparent cost-based materialization and recycling of intermediate results, the execution times of programs can be reduced significantly

Digitale Bibliothek Thüringen

Efficient spatio-temporal event processing with STARK

Author: Hagedorn Stefan
Räth Timo
Publication venue
Publication date: 01/01/2017
Field of study

For Big Data processing, Apache Spark has been widely accepted. However, when dealing with events or any other spatio-temporal data sets, Spark becomes very inefficient as it does not include any spatial or temporal data types and operators. In this paper we demonstrate our STARK project that adds the required data types and operators, such as spatio-temporal filter and join with various predicates to Spark. Additionally, it includes k nearest neighbor search and a density based clustering operator for data analysis tasks as well as spatial partitioning and indexing techniques for efficient processing. During the demo, programs can be created on real world event data sets using STARK's Scala API or our Pig Latin derivative Piglet in a web front end which also visualizes the results

Digitale Bibliothek Thüringen

Big spatial data processing frameworks: feature and performance evaluation: experiments & analyses

Author: Götze Philipp
Hagedorn Stefan
Sattler Kai-Uwe
Publication venue
Publication date: 01/01/2017
Field of study

Nowadays, a vast amount of data is generated and collected every moment and often, this data has a spatial and/or temporal aspect. To analyze the massive data sets, big data platforms like Apache Hadoop MapReduce and Apache Spark emerged and extensions that take the spatial characteristics into account were created for them. In this paper, we analyze and compare existing solutions for spatial data processing on Hadoop and Spark. In our comparison, we investigate their features as well as their performances in a micro benchmark for spatial filter and join queries. Based on the results and our experiences with these frameworks, we outline the requirements for a general spatio-temporal benchmark for Big Spatial Data processing platforms and sketch first solutions to the identified problems

Digitale Bibliothek Thüringen

Putting Pandas in a Box

Author: Hagedorn Stefan
Kläbe Steffen
Sattler Kai-Uwe
Publication venue: ilmedia
Publication date: 11/01/2021
Field of study

Pandas - the Python Data Analysis Library - is a powerful and widely used framework for data analytics. In this work we present our approach to push down the computational part of Pandas scripts into the DBMS by using a transpiler. In addition to basic data processing operations, our approach also supports access to external data stored in files instead of the DBMS. Moreover, user-defined Python functions are transformed automatically to SQL UDFs executed in the DBMS. The latter allows the integration of complex computational tasks including machine learning. We show the usage of this feature to implement a so-called model join, i.e. applying pre-trained ML models to data in SQL tables

Digitale Bibliothek Thüringen

Precise coupling terms in adiabatic quantum evolution: The generic case

Author: Berry
Berry
Betz
Born
Boyd
Hagedorn
Joye
Joye
Joye
Lim
Martinez
Nenciu
Panati
Sjöstrand
Stefan Teufel
Volker Betz
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/11/2004
Field of study

For multi-level time-dependent quantum systems one can construct superadiabatic representations in which the coupling between separated levels is exponentially small in the adiabatic limit. Based on results from [BeTe1] for special Hamiltonians we explicitly determine the asymptotic behavior of the exponentially small coupling term for generic two-state systems with real-symmetric Hamiltonian. The superadiabatic coupling term takes a universal form and depends only on the location and the strength of the complex singularities of the adiabatic coupling function. As shown in [BeTe1], first order perturbation theory in the superadiabatic representation then allows to describe the time-development of exponentially small adiabatic transitions and thus to rigorously confirm Michael Berry's [Ber] predictions on the universal form of adiabatic transition histories.Comment: 30 pages, 1 figur

arXiv.org e-Print Archive

Crossref

PuSH

Processing large raster and vector data in apache spark

Author: Birli Oliver
Hagedorn Stefan
Räth Timo
Sattler Kai-Uwe
Publication venue
Publication date: 01/01/2019
Field of study

Spatial data processing frameworks in many cases are limited to vector data only. However, an important type of spatial data is raster data which is produced by sensors on satellites but also by high resolution cameras taking pictures of nano structures, such as chips on wafers. Often the raster data sets become large and need to be processed in parallel on a cluster environment. In this paper we demonstrate our STARK framework with its support for raster data and functionality to combine raster and vector data in filter and join operations. To save engineers from the burden of learning a programming language, queries can be formulated in SQL in a web interface. In the demonstration, users can use this web interface to inspect examples of raster data using our extended SQL queries on a Apache Spark cluster

Digitale Bibliothek Thüringen

Interactions of multi-quark states in the chromodielectric model

Author: A. Brandt
A. M. Polyakov
Carsten Greiner
F. Karsch
F. Lenz
G. Ripka
G. S. Bali
Gunnar Martens
L. Wilets
R. Hagedorn
R. Hagedorn
R. L. Jaffe
Stefan Leupold
Ulrich Mosel
W. H. Press
Z. Fodor
Publication venue: 'American Physical Society (APS)'
Publication date: 14/03/2006
Field of study

We investigate 4-quark (

qq\bar{q}\bar{q}

) systems as well as multi-quark states with a large number of quarks and anti-quarks using the chromodielectric model. In the former type of systems the flux distribution and the corresponding energy of such systems for planar and non-planar geometries are studied. From the comparison to the case of two independent

q\bar{q}

-strings we deduce the interaction potential between two strings. We find an attraction between strings and a characteristic string flip if there are two degenerate string combinations between the four particles. The interaction shows no strong Van-der-Waals forces and the long range behavior of the potential is well described by a Yukawa potential, which might be confirmed in future lattice calculations. The multi-quark states develop an inhomogeneous porous structure even for particle densities large compared to nuclear matter constituent quark densities. We present first results of the dependence of the system on the particle density pointing towards a percolation type of transition from a hadronic matter phase to a quark matter phase. The critical energy density is found at

\epsilon_c = 1.2 GeV/fm^3

.Comment: 19 pages, 40 eps-figures, RevTex 4, v2: typos correcte

arXiv.org e-Print Archive

Crossref

CERN Document Server

The time-dependent Born-Oppenheimer approximation

Author: Adelman
Berry
Born
Brummelhuis
Colin de
Emmerich
Fermanian-Kammerer
Fermanian-Kammerer
Gianluca Panati
Hagedorn
Hagedorn
Hagedorn
Hagedorn
Herbert Spohn
Kato
Klein
Littlejohn
Mead
Nenciu
Panati
Panati
Sordoni
Spohn
Stefan Teufel
Weigert
Wu
Yin
Publication venue: 'EDP Sciences'
Publication date: 01/01/2007
Field of study

We explain why the conventional argument for deriving the time-dependent Born-Oppenheimer approximation is incomplete and review recent mathematical results, which clarify the situation and at the same time provide a systematic scheme for higher order corrections. We also present a new elementary derivation of the correct second-order time-dependent Born-Oppenheimer approximation and discuss as applications the dynamics near a conical intersection of potential surfaces and reactive scattering.Comment: 17 pages, no figure

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

Numérisation de Documents Anciens Mathématiques

Archivio della ricerca- Università di Roma La Sapienza

Feedback between erosion and active deformation: geomorphic constraints from the frontal Jura fold-and-thrust belt (eastern France)

Author: Fabbri Olivier
Hagedorn Eva-Marie
Madritsch Herfried
Preusser Frank
Schmid Stefan
Ziegler Peter
Publication venue
Publication date: 18/06/2018
Field of study

A regional tectono-geomorphic analysis indicates a Pliocene to recent rock uplift of the outermost segment of the Jura fold-and-thrust belt, which spatially coincides with the intra-continental Rhine-Bresse Transfer Zone. Elevated remnants of the partly eroded Middle Pliocene Sundgau-Forêt de Chaux Gravels identified by heavy mineral analyses allow for a paleo-topographic reconstruction that yields minimum regional Latest Pliocene to recent rock uplift rates of 0.05±0.02mm/year. This uplift also affected the Pleistocene evolution of the Ognon and Doubs drainage basins and is interpreted as being tectonically controlled. While the Ognon River was deflected from the uplifted region the Doubs deeply incised into it. Focused incision of the Doubs possibly sustained ongoing deformation along anticlines which were initiated during the Neogene evolution of the thin-skinned Jura fold-and-thrust belt. At present, this erosion-related active deformation is taking place synchronously with thick-skinned tectonics, controlling the inversion of the Rhine-Bresse Transfer Zone. This suggests local decoupling between seismogenic basement faulting and erosion-related deformation of the Mesozoic cover sequence

RERO DOC Digital Library

Grundlinien der Wirtschaftsentwicklung 2010/2011

Author: Barbara Klotz
Burcu Erdogan
Christian Dreger
Franziska Bremus
Hendrik Hagedorn
Karl Brenke
Kerstin Bernoth
Sebastian Weber
Stefan Kooths
Publication venue
Publication date
Field of study

Das DIW Berlin rechnet für 2010 und 2011 mit einem Wirtschaftswachstum von jeweils rund zwei Prozent. Maßgebliche Triebkräfte kommen von der Binnennachfrage, die - mit Ausnahme der Unternehmensinvestitionen - in großem Umfang durch staatliche Stabilisierungsprogramme sowie durch die automatischen Stabilisatoren gestützt wird. Die wichtigste Säule bildet der private Verbrauch, der von beträchtlichen Kaufkraftzuwächsen der privaten Haushalte profitiert. Für die Exporte ist zunächst noch mit keiner kräftigen Erholung zu rechnen. Die deutschen Ausfuhren dürften aufgrund der Spezialisierung auf Investitionsgüter und des noch relativ geringen Marktanteils in den Wachstumszentren der Weltwirtschaft nur mit Verzögerung - und damit erst im nächsten Jahr - deutlicher am weltwirtschaftlichen Aufschwung teilhaben. Die Zahl der Arbeitslosen wird im kommenden Jahr zwar die Vier-Millionen-Marke übersteigen, angesichts der vorausgegangenen Produktionseinbrüche fällt der Beschäftigungsrückgang jedoch vergleichsweise schwach aus. Ermöglicht wird dies durch eine schwache Produktivitätsentwicklung und eine nur allmähliche Normalisierung der geleisteten Arbeitszeit. Gleichzeitig bleiben die Preise mit einer Inflationsrate um ein Prozent weitgehend stabil. Voraussetzung hierfür ist jedoch eine Beruhigung auf den Rohstoffmärkten, die in der Prognose unterstellt ist. Insgesamt sind die Rückschläge durch die schwere Wirtschaftskrise indes noch nicht überwunden: Erst gegen Ende 2011 dürfte die Wirtschaftskraft Deutschlands wieder an den Wert von Mitte 2008 und damit an das Niveau vor den dramatischen Produktionseinbrüchen heranreichen. Das entspricht rein rechnerisch mehr als drei Jahren mit Nullwachstum. In der Geldpolitik stellt sich die Frage nach dem richtigen Zeitpunkt für einen Ausstieg aus dem expansiven Kurs. Angesichts der noch bestehenden Unsicherheiten bezüglich der weiteren konjunkturellen Erholung und der Nachhaltigkeit der Finanzmarktstabilisierung ist eine nur allmähliche Rückführung der übermäßigen Liquiditätsversorgung empfehlenswert - zumal das Preisstabilitätsziel derzeit nicht gefährdet ist. Die Haushalts- und Finanzpolitik der Bundesregierung ist kritisch zu bewerten: Die Vorhaben der Bundesregierung - Abgabensenkung, Steuerreform, Gesundheitsreform und Einhaltung der Schuldenbremse ab 2016 - mögen für sich genommen jeweils eine gewisse Begründung haben, als Ganzes betrachtet sind diese Maßnahmen jedoch nicht gleichzeitig realisierbar. Diese Widersprüchlichkeit in der Wirtschaftspolitik kann erheblich zur Verunsicherung der privaten Haushalte und der Unternehmen beitragen. Hier wären eine stärkere Prioritätensetzung und eine klarere Gesamtkonzeption dringend geboten.Economic outlook, Business cycle forecast

Research Papers in Economics